- R is very interactive, Q&A with your data
2018-09-01
There are tools and packages around that make our life even more fun!
When you started using R, did you mix up?
install.packages("padr")
and
library(padr)
Or wondered why the library(padr) worked. Even when there is no variable callendpadr?
Apparantly, things that ought not to work, are working.
This results is a language full of magic (also in base):
subset(mtcars, cyl == 6) ggplot2::ggplot(mtcars, aes(mpg, drat)) + geom_point() data.table::as.data.table(mtcars)[ ,mean(mpg), by = cyl]
R is designed to do data science. (Well, then it was still called statistics).
Flexibility to maximize insights.
Enable DSL creation to tailor make tools to solve a specific problem without overhead.
With flexibility comes ambiguity and responsibility.
my_val <- 123
my_func <- function(x) {
x / 42 * 121
}
my_func(71)
## [1] 204.5476
my_func(my_val)
## [1] 354.3571
my_func(your_val)
## Error in my_func(your_val): object 'your_val' not found
By creating a variable we assign a value to a name.
my_val <- 123
123 is the value that is bound to the name my_val.
Binding happens in an environment, in this case the global.
my_val <- 123
123 is the value that is bound to the name my_val.
Binding happens in an environment, in this case the global.
Just call my name, honey, I'll give you the value:
my_val
## [1] 123
R starts looking for the value of name in the local environment.
x <- "a variable in the global"
a_func <- function() {
x <- "a variable in the local"
x
}
a_func()
## [1] "a variable in the local"
When it can't find it locally, move up to the parent environment (where the function was created).
z <- "a variable in the global"
another_func <- function() {
z
}
another_func()
## [1] "a variable in the global"
Finally, an error is thrown when the variable can't be found.
nobody_loves_me <- function() {
y
}
nobody_loves_me()
## Error in nobody_loves_me(): object 'y' not found
So this is standard evaluation in R.
When evaluating a name we look for the value bound to it. We err when we can't find it.
We can also ask R to postpone judgement, by storing the request in a name object.
quote(my_unknown_var) %>% class()
## [1] "name"
When evaluating a name we look for the value bound to it. R errs when it can't find the value.
We can also ask R to postpone judgement, by storing the request in a name object.
quote(my_unknown_var) %>% class()
## [1] "name"
This is the act of quoting, saving something to be evaluated later.
Quoted variable names are not evaluated. It doesn't matter if they don't exist.
quated_var <- quote(wait_for_it) quated_var
## wait_for_it
Quoted variable names are not evaluated. It doesn't matter if they don't exist.
quoted_var <- quote(wait_for_it) quoted_var
## wait_for_it
It will start looking for the value only when we ask to evaluate it.
eval(quoted_var)
## Error in eval(quoted_var): object 'wait_for_it' not found
wait_for_it <- "I finally have a value" eval(quoted_var)
## [1] "I finally have a value"
We can quote the following things:
name: the name of an R object
call: calling of a function
pairlist: something from the past you shouldn't bother about
literal: evaluates to the value itself
Just like a name, a function **call* can be delayed by quoted.
my_little_filter <- function(x,
call) {
x[eval(call, envir = x), ]
}
my_little_filter(mtcars, quote(cyl == 4)) %>% head(2)
## mpg cyl disp hp drat wt qsec vs am gear carb ## Datsun 710 22.8 4 108.0 93 3.85 2.32 18.61 1 1 4 1 ## Merc 240D 24.4 4 146.7 62 3.69 3.19 20.00 1 0 4 2
You'll never have to quote your function arguments when using a DSL.
mtcars %>% select(cyl) as.data.table(mtcars)[, cyl] ggplot(mtcars, aes(cyl)) + geom_bar()
Why does R not throw an error? There is no cyl in the global…
koala <- function(x, y) {
x + 42
}
koala(3)
## [1] 45
def koala(x, y): return(x + 42) koala(3)
## TypeError: koala() takes exactly 2 arguments (1 given) ## ## Detailed traceback: ## File "<string>", line 1, in <module>
So, R doesn't make a fuz until it realy has to.
This allows quoting inside functions.
my_second_little_filter <- function(x, bare_call) {
call <- quote(bare_call)
x[eval(call, envir = x), ]
}
my_second_little_filter(mtcars, cyl == 4) %>% head(2)
## Error in eval(call, envir = x): object 'cyl' not found
Why isn't this working?
quote does literally quote the input, but we want to quote the value of the argument, not the name.
Here we need substitute:
substitute_example <- function(x) {
substitute(x)
}
substitute_example(cyl == 4)
## cyl == 4
substitute_example(cyl == 4) %>% class()
## [1] "call"
So all function arguments are quoted and stored in the promise, alongside the value.
substitute retrieves the expression.
my_correct_second_little_filter <- function(x, bare_call) {
call <- substitute(bare_call)
x[eval(call, envir = x), ]
}
my_correct_second_little_filter(mtcars, cyl == 4) %>% head(1)
## mpg cyl disp hp drat wt qsec vs am gear carb ## Datsun 710 22.8 4 108 93 3.85 2.32 18.61 1 1 4 1
cyl == 4 on itself is invalid, there is no cyl variable in the globla.substitute retrieves just the expression, which is the quoted call.x.cyl variable.func1 <- function() "Calling function 1"
func2 <- function() "Calling function 2"
func_caller <- function(nr) {
eval(parse(text = paste0("func", nr)))()
}
func_caller(1)
## [1] "Calling function 1"
func_caller(2)
## [1] "Calling function 2"
get_source_data <- function(nr,
rerun = FALSE) {
file_path <- paste0("data/source_data_", nr, ".Rdata")
if (file.exists(file_path) && !rerun) {
load(file_path)
} else {
assign(paste0("source_data_", nr),
parse(text = paste0("query_", nr)) %>% eval())
save(list = paste0("source_data_", nr), file = file_path)
}
parse(text = paste0("source_data_", nr)) %>% eval()
}
The tidyverse is implemented using NSE.
mtcars %>% select(cyl)
We now know that cyl gets somehow quoted by select and evaluated within the data frame.
But what if we want to wrap tidyverse code in a custom function?
This won't work
my_grouping_func <- function(x, grouping_var) {
x %>%
group_by(grouping_var) %>%
summarise(max_drat = max(drat))
}
my_tv_func(mtcars, cyl)
Why?
In order to get it to work:
In order to get it to work:
my_grouping_func <- function(x, grouping_var) {
x %>%
group_by(!!grouping_var) %>%
summarise(max_drat = max(drat))
}
my_grouping_func(mtcars, quo(cyl))
## # A tibble: 3 x 2 ## cyl max_drat ## <dbl> <dbl> ## 1 4 4.93 ## 2 6 3.92 ## 3 8 4.22
Just like using substitute you can quote the arguments value with enquo.
my_grouping_func <- function(x, grouping_var) {
grouping_var_q <- enquo(grouping_var)
x %>%
group_by(!!grouping_var_q) %>%
summarise(max_drat = max(drat))
}
my_grouping_func(mtcars, cyl)
## # A tibble: 3 x 2 ## cyl max_drat ## <dbl> <dbl> ## 1 4 4.93 ## 2 6 3.92 ## 3 8 4.22